Project Overview
Background
While high school dropout rates are declining, a staggering 32.9%[1] of U.S. college students still fail to graduate. The core priority for education leaders lies in bridging these gaps by identifying the one-in-four students currently at risk of dropping out before they lose momentum.
In New York alone, there are over 1.9 million individuals with some college but no credential, a population that grew by 2.3%[2] in the past year.
Our Mission
This app aims to reduce academic failure by using machine learning to identify at-risk students at an early stage. By providing actionable insights, we enable educators to implement timely support strategies that keep students on the path to success.
Framework & Solutions
We provide a systematic approach to student success across three dimensions:
Data Reference
This project utilizes a dataset from the UCI Machine Learning Repository to present these real-world challenges. Originally created to help reduce academic attrition in higher education, this data allows us to demonstrate how machine learning can effectively flag students at risk during their academic journey.
Created by: Qi Zhao • zhaoq23009@gmail.com
Last updated: February 09 2026
Student-Level Actions - Personalized Interventions
🔴 Critical Priority Students (Top 10)
| Student ID | Priority | Risk Score | Top Risk Factor | Recommended Action |
|---|---|---|---|---|
| #1384 | Critical | 99.88% | Tuition fees up to date | Immediate intervention recommended: contact within 48 hours |
| #2454 | Critical | 99.87% | Tuition fees up to date | Immediate intervention recommended: contact within 48 hours |
| #3883 | Critical | 99.79% | Tuition fees up to date | Immediate intervention recommended: contact within 48 hours |
| #1403 | Critical | 99.79% | Tuition fees up to date | Immediate intervention recommended: contact within 48 hours |
| #4191 | Critical | 99.75% | Tuition fees up to date | Immediate intervention recommended: contact within 48 hours |
| #888 | Critical | 99.75% | CR_Sem2 | Immediate intervention recommended: contact within 48 hours |
| #3476 | Critical | 99.75% | CR_Sem2 | Immediate intervention recommended: contact within 48 hours |
| #3961 | Critical | 99.75% | CR_Sem2 | Immediate intervention recommended: contact within 48 hours |
| #3987 | Critical | 99.74% | Tuition fees up to date | Immediate intervention recommended: contact within 48 hours |
| #2455 | Critical | 99.72% | CR_Sem2 | Immediate intervention recommended: contact within 48 hours |
🟡 Warning Priority Students (Top 10)
| Student ID | Priority | Risk Score | Top Risk Factor | Recommended Action |
|---|---|---|---|---|
| #1727 | Warning | 69.66% | Curricular units 2nd sem (approved) | Proactive monitoring recommended: monthly check-ins |
| #4174 | Warning | 69.04% | CR_Sem2 | Proactive monitoring recommended: monthly check-ins |
| #912 | Warning | 68.69% | CR_Sem1 | Proactive monitoring recommended: monthly check-ins |
| #1505 | Warning | 68.60% | Tuition fees up to date | Proactive monitoring recommended: monthly check-ins |
| #3829 | Warning | 68.36% | CR_Sem2 | Proactive monitoring recommended: monthly check-ins |
| #2842 | Warning | 68.22% | Tuition fees up to date | Proactive monitoring recommended: monthly check-ins |
| #283 | Warning | 67.53% | CR_Sem2 | Proactive monitoring recommended: monthly check-ins |
| #2694 | Warning | 67.27% | CR_Sem2 | Proactive monitoring recommended: monthly check-ins |
| #152 | Warning | 67.17% | Course | Proactive monitoring recommended: monthly check-ins |
| #3496 | Warning | 66.70% | Tuition fees up to date | Proactive monitoring recommended: monthly check-ins |
🟢 On Track Students (Top 10)
| Student ID | Priority | Risk Score | Top Risk Factor | Recommended Action |
|---|---|---|---|---|
| #1686 | On Track | 39.88% | CR_Sem1 | On track: maintain current support level |
| #207 | On Track | 39.80% | Mother's occupation | On track: maintain current support level |
| #243 | On Track | 39.66% | CR_Sem1 | On track: maintain current support level |
| #1605 | On Track | 39.42% | CR_Sem2 | On track: maintain current support level |
| #2670 | On Track | 39.11% | CR_Sem2 | On track: maintain current support level |
| #3750 | On Track | 38.99% | Course | On track: maintain current support level |
| #1416 | On Track | 38.89% | Course | On track: maintain current support level |
| #759 | On Track | 38.70% | Curricular units 2nd sem (approved) | On track: maintain current support level |
| #2247 | On Track | 38.62% | Delta_CR | On track: maintain current support level |
| #1961 | On Track | 38.51% | Tuition fees up to date | On track: maintain current support level |
What-If Analysis - Intervention Simulation
Sample Student: #1384 - Baseline Dropout Risk: 99.88%
| Scenario | Description | New Risk | Risk Reduction | Reduction % |
|---|---|---|---|---|
| Baseline | No intervention | 99.88% | — | — |
| Academic Support | Increase 2nd semester grade by +10 and evaluations by +5 | 99.86% | 0.02% | 0.0% |
| Financial Support | Set tuition fees up to date and clear debtor status | 99.72% | 0.16% | 0.2% |
| Socio-emotional Support | Increase stress/support-related features by 20% (proxy) | 99.89% | -0.01% | -0.0% |
Cohort-Level Actions - Resource Allocation
Course Risk Analysis
Courses ranked by average dropout risk of enrolled students.
| Rank | Course Code | Course Name | Students | Avg Risk | Priority | Recommended Action |
|---|---|---|---|---|---|---|
| #1 | 33 | Biofuel Production Technologies | 2 | 85.69% | Critical | Immediate intervention: assign dedicated advisor and increase TA support |
| #2 | 9119 | Informatics Engineering | 37 | 52.75% | High | Increase TA support and monitor weekly |
| #3 | 9991 | Management (evening) | 53 | 52.42% | High | Increase TA support and monitor weekly |
| #4 | 9003 | Agronomy | 39 | 47.49% | High | Increase TA support and monitor weekly |
| #5 | 9130 | Equinculture | 30 | 41.50% | High | Increase TA support and monitor weekly |
| #6 | 9853 | Basic Education | 37 | 40.07% | High | Increase TA support and monitor weekly |
| #7 | 9773 | Journalism and Communication | 70 | 37.34% | Medium | Monitor closely and provide supplemental resources |
| #8 | 8014 | Social Service (evening) | 50 | 37.26% | Medium | Monitor closely and provide supplemental resources |
| #9 | 171 | Animation and Multimedia Design | 45 | 36.99% | Medium | Monitor closely and provide supplemental resources |
| #10 | 9556 | Oral Hygiene | 15 | 35.38% | Medium | Monitor closely and provide supplemental resources |
Cohort Comparison - Radar Chart Analysis
With Scholarship vs Without Scholarship (5 Dimensions)
| Dimension | With Scholarship | Without Scholarship | Difference |
|---|---|---|---|
| Academic Prep | 0.38 | 0.36 | +0.02 |
| Current Success | 0.47 | 0.37 | +0.10 |
| Engagement | 0.26 | 0.26 | 0.00 |
| Financial Stress | 0.95 | 0.89 | +0.06 |
| Stability | 0.22 | 0.29 | -0.07 |
Group 1 vs Group 0 (5 Dimensions)
| Dimension | Group 1 | Group 0 | Difference |
|---|---|---|---|
| Academic Prep | 0.36 | 0.37 | -0.01 |
| Current Success | 0.34 | 0.43 | -0.09 |
| Engagement | 0.25 | 0.26 | -0.01 |
| Financial Stress | 0.88 | 0.91 | -0.03 |
| Stability | 0.31 | 0.25 | +0.06 |
Group 1 vs Group 0 (5 Dimensions)
| Dimension | Group 1 | Group 0 | Difference |
|---|---|---|---|
| Academic Prep | 0.39 | 0.37 | +0.02 |
| Current Success | 0.40 | 0.40 | 0.00 |
| Engagement | 0.26 | 0.26 | 0.00 |
| Financial Stress | 0.90 | 0.90 | 0.00 |
| Stability | 0.29 | 0.27 | +0.02 |
Entity-Driven Feedback Matrix
Natural Language Processing Analysis: Student feedback surveys analyzed using entity extraction and sentiment classification to identify actionable improvement areas.
- Success Cases (Top Right): High-frequency, positive topics - scale these practices
- Critical Issues (Bottom Right): High-frequency, negative topics - immediate intervention required
- Niche Strengths (Top Left): Low-frequency, positive topics - monitor and expand
- Minor Frustrations (Bottom Left): Low-frequency, negative topics - low priority
System-Level Actions - Policy Interventions
Top 10 Global Risk Factors
Features with the highest mean absolute SHAP values across all students.
| Rank | Feature | Mean |SHAP| | Policy Action |
|---|---|---|---|
| 1 | CR_Sem2 |
|
Monitor the factor and evaluate targeted interventions |
| 2 | Tuition fees up to date |
|
Establish emergency financial aid funding and improve payment flexibility |
| 3 | Curricular units 2nd sem (approved) |
|
Strengthen academic support programs and targeted tutoring |
| 4 | CR_Sem1 |
|
Monitor the factor and evaluate targeted interventions |
| 5 | Course |
|
Monitor the factor and evaluate targeted interventions |
| 6 | Unemployment rate |
|
Monitor the factor and evaluate targeted interventions |
| 7 | Curricular units 2nd sem (grade) |
|
Strengthen academic support programs and targeted tutoring |
| 8 | Gender |
|
Monitor the factor and evaluate targeted interventions |
| 9 | Age at enrollment |
|
Monitor the factor and evaluate targeted interventions |
| 10 | Admission grade |
|
Strengthen academic support programs and targeted tutoring |
Longitudinal Monitoring of Student Climate
NLP-Powered Sentiment Analysis & Entity Attribution
Automated Monitoring System: Monthly sentiment scores derived from student communications (emails, forum posts, surveys) with spaCy-powered entity extraction to identify structural friction points.
Policy Simulation - Cost-Benefit Analysis
Baseline Metrics
Intervention Scenarios
Financial Aid Program
Engagement Nudge System
Intake & Quality Checks
Weekly intake → schema alignment → integrity audit → target balance & drift
Basic checks
| rows | cols | missing_cells | missing_cell_pct | duplicate_rows | has_missing | has_duplicates |
|---|---|---|---|---|---|---|
| 4424 | 37 | 0 | 0.0 | 0 | No | No |
Column type summary
| dtype | n_cols |
|---|---|
| Int64 | 29 |
| float64 | 7 |
| str | 1 |
Details (click to expand):
Int64 — 29 columns
float64 — 7 columns
str — 1 columns
Outcome distribution
No data available.
Data Quality Checks
1. Missing Data Check
No missing data issues found.
2. Outlier Check
| column | outlier_count | outlier_pct | handling | rationale |
|---|---|---|---|---|
| Age at enrollment | 156 | 3.526221 | flag_only | High age represents non-traditional students (Risk Signal) |
| Curricular units 1st sem (evaluations) | 33 | 0.745931 | flag_only | Default: Flag for monitoring without modification |
| Curricular units 2nd sem (evaluations) | 15 | 0.339060 | flag_only | Default: Flag for monitoring without modification |
3. Sanity Check (Integrity Audit)
| check_name | severity | affected_rows | details |
|---|---|---|---|
| tuition_vs_scholarship_review | INFO | 46 | Scholarship=1 but tuition=0. Review: partial scholarship or payment timing? |
Basic checks
| rows | cols | missing_cells | missing_cell_pct | duplicate_rows | has_missing | has_duplicates |
|---|---|---|---|---|---|---|
| 1475 | 37 | 0 | 0.0 | 0 | No | No |
Column type summary
| dtype | n_cols |
|---|---|
| Int64 | 29 |
| float64 | 7 |
| str | 1 |
Details (click to expand):
Int64 — 29 columns
float64 — 7 columns
str — 1 columns
Outcome distribution
No data available.
Data Quality Checks
1. Missing Data Check
No missing data issues found (or see Overall tab).
2. Outlier Check
No outlier issues found (or see Overall tab).
3. Sanity Check (Integrity Audit)
| check_name | severity | affected_rows | details |
|---|---|---|---|
| gdp_negative | WARN | 596 | GDP < 0 (data quality issue) |
| inflation_negative | WARN | 307 | Inflation rate < 0 (deflation or data issue) |
| check_name | severity | affected_rows | details |
|---|---|---|---|
| tuition_vs_scholarship_review | INFO | 19 | Scholarship=1 but tuition=0. Review: partial scholarship or payment timing? |
Basic checks
| rows | cols | missing_cells | missing_cell_pct | duplicate_rows | has_missing | has_duplicates |
|---|---|---|---|---|---|---|
| 1475 | 37 | 0 | 0.0 | 0 | No | No |
Column type summary
| dtype | n_cols |
|---|---|
| Int64 | 29 |
| float64 | 7 |
| str | 1 |
Details (click to expand):
Int64 — 29 columns
float64 — 7 columns
str — 1 columns
Outcome distribution
No data available.
Drift check
This vs last
| Target | this_pct | last_pct | delta_pct |
|---|---|---|---|
| Graduate | 49.762712 | 48.474576 | 1.288136 |
| Dropout | 31.661017 | 33.152542 | -1.491525 |
| Enrolled | 18.576271 | 18.372881 | 0.203390 |
This vs cumulative
| Target | this_pct | week1_pct | delta_pct |
|---|---|---|---|
| Graduate | 49.762712 | 48.474576 | 1.288136 |
| Dropout | 31.661017 | 33.152542 | -1.491525 |
| Enrolled | 18.576271 | 18.372881 | 0.203390 |
Data Quality Checks
1. Missing Data Check
No missing data issues found (or see Overall tab).
2. Outlier Check
No outlier issues found (or see Overall tab).
3. Sanity Check (Integrity Audit)
| check_name | severity | affected_rows | details |
|---|---|---|---|
| gdp_negative | WARN | 572 | GDP < 0 (data quality issue) |
| inflation_negative | WARN | 316 | Inflation rate < 0 (deflation or data issue) |
| check_name | severity | affected_rows | details |
|---|---|---|---|
| tuition_vs_scholarship_review | INFO | 14 | Scholarship=1 but tuition=0. Review: partial scholarship or payment timing? |
Basic checks
| rows | cols | missing_cells | missing_cell_pct | duplicate_rows | has_missing | has_duplicates |
|---|---|---|---|---|---|---|
| 1474 | 37 | 0 | 0.0 | 0 | No | No |
Column type summary
| dtype | n_cols |
|---|---|
| Int64 | 29 |
| float64 | 7 |
| str | 1 |
Details (click to expand):
Int64 — 29 columns
float64 — 7 columns
str — 1 columns
Outcome distribution
No data available.
Drift check
This vs last
| Target | this_pct | last_pct | delta_pct |
|---|---|---|---|
| Graduate | 51.560380 | 49.762712 | 1.797668 |
| Dropout | 31.546811 | 31.661017 | -0.114206 |
| Enrolled | 16.892809 | 18.576271 | -1.683463 |
This vs cumulative
| Target | this_pct | cum(w1+w2)_pct | delta_pct |
|---|---|---|---|
| Graduate | 51.560380 | 49.118644 | 2.441736 |
| Dropout | 31.546811 | 32.406780 | -0.859968 |
| Enrolled | 16.892809 | 18.474576 | -1.581768 |
Data Quality Checks
1. Missing Data Check
No missing data issues found (or see Overall tab).
2. Outlier Check
No outlier issues found (or see Overall tab).
3. Sanity Check (Integrity Audit)
| check_name | severity | affected_rows | details |
|---|---|---|---|
| gdp_negative | WARN | 543 | GDP < 0 (data quality issue) |
| inflation_negative | WARN | 300 | Inflation rate < 0 (deflation or data issue) |
| check_name | severity | affected_rows | details |
|---|---|---|---|
| tuition_vs_scholarship_review | INFO | 13 | Scholarship=1 but tuition=0. Review: partial scholarship or payment timing? |
Feature Profiling
Variable distributions by feature group (binary, continuous, categorical)
Data Dictionary: For coded variables, see UCI Dataset Documentation ↗
Demographics
Binary variables
Continuous/Count variables
Categorical variables
Family background
Categorical variables
Financial / administrative
Binary variables
Admissions & pathway
Continuous/Count variables
Categorical variables
Academic signals (Sem 1)
Continuous/Count variables
Academic signals (Sem 2)
Continuous/Count variables
Macro context
Continuous/Count variables
Target Analysis
How different variables relate to outcomes (Graduate / Enrolled / Dropout)
Gender
Scholarship holder
Tuition fees up to date
Debtor
International
Age at enrollment
Admission grade
Curricular units 1st sem (approved)
Curricular units 1st sem (grade)
Feature Engineering
Temporal trajectories and text-derived features
Longitudinal Features
Time-series features capturing student trajectory over semesters
| Feature Name | Description | Type | Mean |
|---|---|---|---|
| Delta_Grade | Change in average grade from Semester 1 to Semester 2 | Continuous | -0.411 |
| Grade_Improvement | Binary indicator of grade improvement | Binary | 0.351 |
| Grade_Decline | Binary indicator of grade decline | Binary | 0.389 |
| CR_Sem1 | Credit Ratio Semester 1 (approved/enrolled) | Continuous | 0.727 |
| CR_Sem2 | Credit Ratio Semester 2 (approved/enrolled) | Continuous | 0.688 |
| Delta_CR | Change in Credit Ratio from Semester 1 to Semester 2 | Continuous | -0.039 |
| Completion_Collapse | Binary indicator of completion rate collapse | Binary | 0.022 |
| Fail_Crossing | Binary indicator of crossing the failure threshold | Binary | 0.044 |
| Borderline_Collapse | Binary indicator of borderline performance collapse | Binary | 0.017 |
| EvalPressure_Sem1 | Evaluation pressure Semester 1 (evaluations/enrolled) | Continuous | 1.345 |
| EvalPressure_Sem2 | Evaluation pressure Semester 2 (evaluations/enrolled) | Continuous | 1.315 |
| Delta_EvalPressure | Change in evaluation pressure between semesters | Continuous | -0.030 |
| GhostRate_Sem1 | Ghost enrollment rate Semester 1 | Continuous | 0.023 |
| GhostRate_Sem2 | Ghost enrollment rate Semester 2 | Continuous | 0.026 |
| Delta_GhostRate | Change in ghost enrollment rate | Continuous | 0.003 |
| Ghost_Worsening | N/A | Binary | 0.011 |
NLP Features (Simulated)
Text-derived psychological and behavioral signals
| Feature Name | Description | Source | Coverage | Mean |
|---|---|---|---|---|
| Academic_Stress | Stress level indicator derived from sentiment analysis | Sentiment analysis | 70.8% | 0.206 |
| Academic_Stress_Level | Categorical stress level (Low/Medium/High) | Sentiment analysis | 70.8% | N/A |
| Home_Support_Risk | Family support risk score (0=stable, 1=at-risk) | Topic modeling | 70.8% | 0.215 |
| Subject_Specific | Subject-specific difficulties identified | Keyword extraction | 100.0% | N/A |
| Subject_Risk_Flag | Binary indicator of subject-specific difficulty | Keyword extraction | 100.0% | 0.328 |
| Subject_Difficulty_Score | Numerical difficulty score for subject | Keyword extraction | 70.8% | 0.340 |
Academic Stress Level Distribution
Sample Simulated Texts
"I'm enjoying my courses and managing time well. English writing assignments are very difficult."
Stress: Low Home Risk: Medium Subject: Language"I have good support from family and doing well. Language barrier is making comprehension hard."
Stress: Low Home Risk: Low Subject: Language"Some courses are harder than expected, but I'm coping. English writing assignments are very difficult."
Stress: Medium Home Risk: Low Subject: LanguageCorrelation Analysis
Examining relationships between different feature types
Type 1: Static Variables
Correlations among demographic and enrollment features
Type 2: Static vs Longitudinal
How baseline characteristics relate to performance trajectories
Type 3: Longitudinal vs NLP
Links between performance metrics and psychological indicators
Feature Importance
Identifying predictive features using multiple methods
L1 Logistic Regression
Sparse feature selection through L1 regularization
| Rank | Feature | Coefficient |
|---|---|---|
| 1 | CR_Sem2 | -1.171008 |
| 2 | Tuition fees up to date | -0.726209 |
| 3 | Curricular units 1st sem (approved) | -0.604850 |
| 4 | Curricular units 2nd sem (credited) | 0.317461 |
| 5 | Curricular units 2nd sem (approved) | -0.305634 |
| 6 | Age at enrollment | 0.258476 |
| 7 | Mother's occupation | -0.233480 |
| 8 | Scholarship holder | -0.219680 |
| 9 | Curricular units 1st sem (credited) | 0.207680 |
| 10 | International | -0.168801 |
Random Forest
Feature importance based on impurity reduction
| Rank | Feature | Importance |
|---|---|---|
| 1 | CR_Sem2 | 0.165800 |
| 2 | CR_Sem1 | 0.115122 |
| 3 | Curricular units 2nd sem (approved) | 0.111475 |
| 4 | Curricular units 2nd sem (grade) | 0.098250 |
| 5 | Tuition fees up to date | 0.063876 |
| 6 | Curricular units 1st sem (approved) | 0.057655 |
| 7 | Curricular units 1st sem (grade) | 0.040976 |
| 8 | Academic_Stress | 0.034186 |
| 9 | Age at enrollment | 0.025320 |
| 10 | Delta_CR | 0.020882 |
Combined Ranking
Consensus ranking across L1 and Random Forest methods
| Feature | Coefficient | Importance | Combined_Score | L1_Rank | RF_Rank |
|---|---|---|---|---|---|
| CR_Sem2 | -1.171008 | 0.165800 | 5.937941 | 1.0 | 1.0 |
| Tuition fees up to date | -0.726209 | 0.063876 | 3.662985 | 2.0 | 5.0 |
| Curricular units 1st sem (approved) | -0.604850 | 0.057655 | 3.053077 | 3.0 | 6.0 |
| Curricular units 2nd sem (credited) | 0.317461 | 0.003135 | 1.588870 | 4.0 | 36.0 |
| Curricular units 2nd sem (approved) | -0.305634 | 0.111475 | 1.583906 | 5.0 | 3.0 |
| Age at enrollment | 0.258476 | 0.025320 | 1.305040 | 6.0 | 9.0 |
| Mother's occupation | -0.233480 | 0.008997 | 1.171901 | 7.0 | 22.0 |
| Scholarship holder | -0.219680 | 0.005523 | 1.101163 | 8.0 | 28.0 |
| Curricular units 1st sem (credited) | 0.207680 | 0.004117 | 1.040456 | 9.0 | 32.0 |
| International | -0.168801 | 0.000000 | 0.844006 | 10.0 | 51.0 |
Model Overview
Dataset summary and modeling approach
Dataset Statistics
Target Distribution
| Outcome | Count | Percentage |
|---|---|---|
| Graduate | 2209 | 49.9% |
| Dropout | 1421 | 32.1% |
| Enrolled | 794 | 17.9% |
Class Imbalance Analysis
Handling Strategy
| Strategy | Rationale |
|---|---|
| class_weight_only |
With a class ratio of 2.78:1, a conservative class_weight="balanced"
approach is a reasonable baseline. This adjusts the loss contribution by class without modifying
the observed data distribution.
|
Best Model Selection
| Attribute | Details |
|---|---|
| Purpose | Comparison benchmark |
| Regularization Strategy | L1/L2 regularization + tree constraints |
Model Performance Comparison
| Model | Purpose | Precision | Recall | F1-Score | ROC-AUC |
|---|---|---|---|---|---|
| Logistic Regression (L1) | Interpretability baseline - identify core risk factors | 0.629 | 0.627 | 0.622 | 0.809 |
| Decision Tree | Rule extraction - interpretable decision logic for stakeholders | 0.668 | 0.634 | 0.635 | 0.840 |
| Random Forest | Performance benchmark - candidate primary model | 0.703 | 0.705 | 0.694 | 0.875 |
| XGBoost | Comparison benchmark | 0.705 | 0.682 | 0.689 | 0.886 |
Detailed Model Information
Logistic Regression (L1)
Regularization: L1 penalty (C=0.1) + balanced class weights
Confusion Matrix
| Pred: Graduate | Pred: Dropout | Pred: Enrolled | |
|---|---|---|---|
| True: Graduate | 175 | 73 | 36 |
| True: Dropout | 32 | 79 | 48 |
| True: Enrolled | 32 | 71 | 339 |
Top 10 Feature Importance
| Rank | Feature | Importance |
|---|---|---|
| 1 | Curricular units 2nd sem (grade) | 0.0480 |
| 2 | Curricular units 2nd sem (approved) | 0.0368 |
| 3 | Curricular units 1st sem (grade) | 0.0306 |
| 4 | Curricular units 1st sem (approved) | 0.0302 |
| 5 | Curricular units 2nd sem (evaluations) | 0.0210 |
| 6 | Age at enrollment | 0.0204 |
| 7 | Delta_Grade | 0.0174 |
| 8 | Curricular units 1st sem (evaluations) | 0.0153 |
| 9 | Application mode | 0.0067 |
| 10 | Unemployment rate | 0.0066 |
Decision Tree
Regularization: Max depth=5, min samples leaf=50
Confusion Matrix
| Pred: Graduate | Pred: Dropout | Pred: Enrolled | |
|---|---|---|---|
| True: Graduate | 172 | 75 | 37 |
| True: Dropout | 19 | 80 | 60 |
| True: Enrolled | 1 | 91 | 350 |
Top 10 Feature Importance
| Rank | Feature | Importance |
|---|---|---|
| 1 | CR_Sem2 | 0.7330 |
| 2 | Tuition fees up to date | 0.0896 |
| 3 | Delta_CR | 0.0717 |
| 4 | CR_Sem1 | 0.0364 |
| 5 | Curricular units 2nd sem (evaluations) | 0.0201 |
| 6 | GDP | 0.0127 |
| 7 | Age at enrollment | 0.0111 |
| 8 | EvalPressure_Sem1 | 0.0096 |
| 9 | Delta_EvalPressure | 0.0048 |
| 10 | Course | 0.0042 |
Random Forest
Regularization: Balanced class weights + depth and leaf constraints
Confusion Matrix
| Pred: Graduate | Pred: Dropout | Pred: Enrolled | |
|---|---|---|---|
| True: Graduate | 201 | 64 | 19 |
| True: Dropout | 26 | 98 | 35 |
| True: Enrolled | 13 | 80 | 349 |
Top 10 Feature Importance
| Rank | Feature | Importance |
|---|---|---|
| 1 | CR_Sem2 | 0.1565 |
| 2 | Curricular units 2nd sem (approved) | 0.1390 |
| 3 | CR_Sem1 | 0.1200 |
| 4 | Curricular units 2nd sem (grade) | 0.0775 |
| 5 | Curricular units 1st sem (approved) | 0.0750 |
| 6 | Tuition fees up to date | 0.0464 |
| 7 | EvalPressure_Sem2 | 0.0393 |
| 8 | Curricular units 1st sem (grade) | 0.0372 |
| 9 | Academic_Stress | 0.0352 |
| 10 | EvalPressure_Sem1 | 0.0341 |
Fairness Audit Overview
Feature: Gender
| Metric | Result | Status |
|---|---|---|
| Disparate Impact | 1.000 | Fair |
| TPR Difference | 0.000 | Fair |
| FPR Difference | 0.000 | Fair |
Feature: Scholarship holder
| Metric | Result | Status |
|---|---|---|
| Disparate Impact | 1.000 | Fair |
| TPR Difference | 0.000 | Fair |
| FPR Difference | 0.000 | Fair |
Feature: Displaced
| Metric | Result | Status |
|---|---|---|
| Disparate Impact | 1.000 | Fair |
| TPR Difference | 0.000 | Fair |
| FPR Difference | 0.000 | Fair |
Feature: International
| Metric | Result | Status |
|---|---|---|
| Disparate Impact | 1.000 | Fair |
| TPR Difference | 0.000 | Fair |
| FPR Difference | 0.000 | Fair |
Feature: Nacionality
| Metric | Result | Status |
|---|---|---|
| Disparate Impact | 1.000 | Fair |
| TPR Difference | 0.000 | Fair |
| FPR Difference | 0.000 | Fair |
Feature: Debtor
| Metric | Result | Status |
|---|---|---|
| Disparate Impact | 1.000 | Fair |
| TPR Difference | 0.000 | Fair |
| FPR Difference | 0.000 | Fair |
Feature: Educational special needs
| Metric | Result | Status |
|---|---|---|
| Disparate Impact | 1.000 | Fair |
| TPR Difference | 0.000 | Fair |
| FPR Difference | 0.000 | Fair |